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Abstract 

We present two models for estimating the probabilities of future earthquakes in California, to be tested 
in the Collaboratory for the Study of Earthqu ake Predictability (CSEP). The first, time-independent 



model, modified from 



Helmstetter et al 



2007], provides five-year forecasts for magnitudes m > 4.95. 
We show that large quakes occur on average near the locations of small m > 2 events, so that a high- 
resolution estimate of the spatial distribution of future large quakes is obtained from the locations of 
the numerous small events. We employ an adaptive spatial kernel of optimized bandwidth and assume 
a universal, tapered Gutenberg-Richter distribution. In retrospective tests, we show that no Poisson 
forecast could capture the observed variability. We therefore also test forecasts using a negative binomial 
distribution for the number of events. We modify existing likelihood-based tests to better evaluate the 
spatial forecas t. Our time-dependent model, an Epidemic Type Aftershock Sequence (ETAS) model 



modified from 



Helmstetter et al 



20061 ] . provides next-day forecasts for m > 3.95. The forecasted rate is 
the sum of a background rate, proportional to our time-independent model, and of the triggered events 
due to all prior earthquakes. Each earthquake triggers events with a rate that increases exponentially with 



1 



its magnitude and decays in time according to Omori's law. An isotropic kernel models the spatial density 
of aftershocks for small (< 5.5) events. For larger quakes, we smooth early aftershocks to forecast later 
events. We estimate parameters by optimizing retrospective forecasts. Our short-term model realizes a 
gain of about 6.0 over the time-independent model. 



1 Introduction 



A wide range of ideas and hypotheses exist about how, when and where earthquakes occur and about 
how big they will be. Given seismicity's strong stochasticity, along with data quality and quantity 
issues, the most promising path towards weeding out the good ideas from the bad ones is rigorous, 
comparative and transparent evaluation of prospective, synoptic and probabilistic earthquake forecasts. 
Such forecasts make scientific hypotheses of earthquake occurrence testable, transparent and refutable. 
A major step along this path was taken by the Working Group on Regional Earthquake Likelihood 
Models (RELM), which i nvited long- t erm (5-year) forecasts for California in a specific format to facili 



tate comparative testing 



Schorlemmer et al 



Field 



2007 



Schorlemmer et al 



2007 



Schorlemmer and Gerstenberaer 



2007 



2009I |. Building on RELM's success, the Collaboratory for the Study of Earthquake 
Predictability (CSEP, www . csepte sting . or g ) inherited and expanded RELM's mission to region ally and 



globally test prospective forecasts 



Jordan 



2006 



Schorlemmer et al 



2009 



Zechar et al 



2009a ] 



The development of better models to be tested in CSEP remains the highest priority. We present 
two models to estimate the probabilities of future earthquakes in California. The first model provides 
time-independent, long-term (5-year) estimates of the probabilities of future earthquakes of magnitudes 
m > 4.95 in California, in a format suitable for testing within CSEP. The second model estimates 
short-term (one-day) probabilities of f uture m > 3.95 ea r thqua kes. Both models are exten ded and/or 



modified versions of those developed by 



Helmstetter et al 



20071 ] and 



Helmstetter et al 



20061 1 . We made 



improvements to the long-term model and updated using the latest available data. The short-term 
model was extended from southern California to all of California. We made some further modifications 
(described in the text) and re-estimated its parameters using all available data up to f April 2009. 

Both models are based on simple, yet hotly contested hypotheses. Both models solely require past 
seismicity as data input. Whether and at which point other data such as geological or geodetic data 
could improve the forecasts remains an interesting open question. The long-term model assumes that 
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future earthquakes will occur with higher probability in areas where past earthquakes have occurred. 
To turn this hypothesis into a testable forecast, we smoothed the locations of past seismicity using an 
adaptive kernel. The long-term model assumes that sequences of triggered events are mostly temporary 
perturbations to the long-term rate; we therefore attempted to remove them using a relatively arbitrary 
method. Moreover, the magnitude of each earthquake is independently distributed acco rding to a tapered 



Gute nberg- Richter distribution with corner magnitude m c = 8.0 relevant for California 



Bird and Kaaan 



20041 ] . independent of geological setting, past earthquakes or any other characteristic. Because of the 
assumed magnitude independence, the model also uses small m > 2 events to forecast large m > 4.95 
events. The small quakes indicate regions of active seismicity, and retrospective tests support the claim 
that including these small events improves forecasts of larger ones. 

The time-dependent, short-term model makes similar assumptions. First, the same assumption holds 
regarding the distribution of magnitudes. Second, every earthquake can trigger future earthquakes with a 
rate that increases exponentially with the magnitude of the triggering event. The subjective, retrospective 
distinction between fore-, main- and aftershocks is thereby eliminated: every earthquake can trigger other 
events, and those triggered events may be larger than the triggering shock. The time-dependence enters 
in the form of the well-known Omori-Utsu law. However, in our model the Omori-Utsu law applies to all 
earthquakes, not just to large earthquakes that trigger smaller ones. In addition to the contribution to 
the seismicity from past quakes, the model assumes that there exists a background rate, which is modeled 
as a spatially heterogeneous Poisson process. The background's normalized spatial distribution is taken 
directly from the first, time-independent model, but its rate is estimat ed. The mod el is a particular 
implementation of the epidemic-type aftershock sequence (ETAS) model 



Ogata 



19881 1 . The model may 



be interpreted as a simple but powerful null hypothesis of earthquake clustering and triggering, based 
on empirical distributions of seismicity. 

This article (and its five electronic supplements) describes the two models, their calibration on earth- 
quake catalogs, results from retrospective forecasts, and what we learned about seismicity in the process. 
With these forecasts, we respond to CSEP's call for testable forecasts in a specific format: The expected 
number of quakes in individual magnitude bins of width 0.1 from m = 4 to m — 9 in each spatial cell of 
0.1 by 0.1 degrees latitude/longitude in a predefined testing region that inclu des California and extends 



about one degree beyond its borders 



Schorlemmer and Gerstenberaer 



2007 1- 



The article is structured as follows. We describe the data in section[2] (removing explosions is discussed 
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in Supplement 1). Section [3] explains our method for estimating the spatial distribution of spontaneous 
seismicity, which we used both for the long-term forecast and as the background in the short-term 
forecasts. We separated the data into two sets: the first set (the learning or training catalog) is used to 
forecast the second set (the target catalog). We estimated smoothing parameters by optimizing these 
forecasts. In Supplement 2, we optimize parameters for different target catalog magnitude thresholds. 
In section 2J we calculate the expected number of events and apply the Gutenberg- Richter distribution 
to generate the time-independent five-year forecast. In Supplement 3, we discuss a forecast for the so- 
called CSEP mainshock-only forecast group. To assess the time-dependent forecasts, we rescaled the 
five-year forecast to a one-day time-independent forecast (section [5}. Section [6] defines the ETAS model, 
describes the parameter estimation and discusses the model's goodness of fit to the data. Supplement 
4 treats the performance of the parameter optimization algorithm. Supplement 5 provides parameter 
estimates for target magnitudes m > 2 rather than the threshold m > 3.95 of the CSEP one-day forecast 
group. Section [7] compares the ETAS forecast with the time-independent model during the Baja swarm 
of February 2009. Both models will be installed at the SCEC testing center of CSEP. 



2 Data: The ANSS Earthquake Catalog 



We used the ANSS catalog in the period from 1 January 1981 until 1 April 2009 with magnitude m > 2 
in th e spatial region defined by the RELM/ CSEP collection region, as defined by the polygon in Table 



2 by 



Schorlemmer and Gerstenberaer 



20071 ]. The catalog listed 167,548 events. Underground nuclear 
explosions at the Nevada Test Site contaminate the earthquake catalog. Supplement 1 documents how 
we identified and removed 21 explosions from the catalog, 9 of which were large m > 5 events. 

Earthquake catalogs change over time, because of reprocessing, deletion or addition of past events. As 
a result, forecasts based on supposedly identical data differ from each other, sometimes signific antly. We 



Helmstetter et al 



2007t . The sole 



encountered this problem when attempting to reproduce the results ofL 
way to guarantee full reproducibility is thus to store the data set. The interested reader may obtain the 
data (and computer programs) from mercalli.ethz.ch/~mwerner/WORK/CaliforniaForecasts/CODE/ 
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3 Spatial Distribution of Spontaneous Seismicity 



3.1 Declustering Seismicity 



To estimate the spatial dist ribution of i ndependent events, we used a modified version of the Reasen 



Reasenbera 



1085 



Helmstetter et al 



2007]. Wc set the input parameters to 



berg declustering algorithm 
r fact — 8 (dimensionless) , x me // = 2 (units of magnitude), x% = 0.5 (dimensionless), pi = 0.95 (dimen- 
sionless probability), Tmin = 1 day an d T max = 5 days. We used as t he interaction distance the scaling 
r = 0.01 x io°- 5M km, as suggested by 



Wells and Coppersmith 



1994 ]. instead of r = 0.011 x io 4M km 



and r < 30 km in Reasenberg's algorithm. We further set the location errors to 1 km horizontal and 2 
km vertical. Figure [T] compares the original with the declustered catalog. The declustering algorithm 
found that 57% of the events were spontaneous. We chose this method to e stimate spontaneous seismic 



ity over more sophisticated algori t hms based on stochastic process theory 



2002 



2004 



Marsan and Lenaline 



Kaaan 



1991 



Zhuana et al 



20081 ] because of its simplicity. The declustering algorithm was not 
optimized for forecasting and the particular procedure may have significant effects on the forecasts. In 
the future, we'd like to test and optimize the declustering algorithm, too. 



3.2 Adaptive Kernel Smoothing of Declustered Seismicity 

We estimated the density of spontaneous seismicity in each 0.1 by 0.1 degree cell by smoothing the 
location of each earthquake in the declustered catalog with an isotropic adaptive kernel Kd^r). We 
tested two choices for Kd(r), either a power- law 



C(d) 



(ifi 2 + d 2 r 



or a Gaussian 



K d (r) = C'(d)exp 



2d 2 



(1) 



(2) 



where d is the adaptive smoothing distance, and C(d) and C'(d) are normalizing factors, so that the 
integral of Kd (r) over an infinite area equals 1. We measured the smoothing distance di associated with 
an earthquake i as the horizontal (epicentral) distance between event i and its n„th closest neighbor. 
The number of neighbors, n v , is an adjustable parameter, estimated by optimizing the spatial forecasts 
(see section [3.4fl . We imposed di > 0.5 km to account for location uncertainty. The kernel bandwidth 
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1980 1990 2000 2010 



Figure 1: a): Original ANSS catalog m > 2 from 1 January 1981 until 1 April 2009 in the CSEP collection 
polygon around California, b): Original (black) and declustered (red) seismicity per month, c): 
Origi nal (black) and de clustered (red) cumulative seismicity. To decluster, we used a method modified 
from{Reasenberq 1985J. 



6 



di thus decreases in regions of dense seismicity, so that we have better resolution (smaller di) where the 
density is higher. 

The density fj,(f) at any point f is then estimated from the N events by 

N 

/*(r) = ^i^(r-r1) (3) 

However, our forecasts are given as an average number of events in each 0.1° by 0.1° cell. We therefore 
integrated equation Q over each cell to obtain the seismicity rate in this cell. 

3.3 Correcting for Spatial Magnitude Incompleteness 

We used events with magnitudes m > 2 to estimate the spatial distribution of seismicity. Unfortunately, 
the catalog is not complete everywhere at this magnitude level. To correct for catalog incompleteness, 
we estimated the completeness magnitude mo in each cell from the magnitude distribution. 

We can estimate the magnitude distribution P m (f,m) at point f using the same kernel method as 
described above to estimate the density p>(r). We smoothed both the locations and the magnitudes of 
all (declustered) earthquakes using 



where Gh(m) is a Gaussian function of mean m and width h. The kernel width h was fixed to 0.15 
magnitude units. We then integrated P m (r, m) over each cell to get the magnitude distribution in these 
cells. We estimated the completeness magnitude mo in each cell as the magnitude at which the smoothed 
magnitude distribution is at a maximum. Using this method, we obtain large small-scale fluctuations of 
mo, which are probably unphysical. The completeness magnitude should be relatively smooth at scales 
smaller than the typical distance between seismic stations. Therefore, we smoothed the completeness 
magnitude mo using a Gaussian filter with a standard deviation of 0.15°. The result is shown in Figure 
[21 where we used the power-law kernel ([TJ with a smoothing distance di in equation @ to the 6th nearest 
neighbor (i.e. n v =6). Most of the region has mo ~ 2 (our method does not estimate completeness 
thresholds smaller than m = 2). The completeness magnitude is much larger, close to mo = 3.5, close 
to the boundaries of the collection region, especially in the Mendocino area and in Mexico. Comparing 



N 




(4) 
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Figure 2: Left: Completeness magnitude estimated from the maxima of the magnitude distribution in each cell. 
Right: Smoothed completeness magnitude. 



Helmstetter et al 



20071 ] . we found significant differences in the estimated 



our Figure [2] to Figure 2 of , 
completenes s magnitude due to two reasons. Firstly, we found and corrected a bug in the smoothing 



algorithm of 



Helmstetter et al 



2007], which artificially increased the completeness magnitude near the 



boundaries. Secondly, we used more earthquake data, up to 1 April 2009. 

Our method failed when few earthquakes exist from which to estimate the completeness magnitude. 
We introduced two simple ad hoc rules for such situations. First, if the magnitude threshold of the 
learning catalog is m min > 3.0, we set the completeness threshold to itlq = 3.0 to eliminate clearly 
unphysical fluctuations. Thus, we only correct for a spatially varying completeness threshold for learning 
catalogs that include events m m in < 3, when completeness effects are severe. Second, if the estimated 
mo > 3.5, we set mo = 3.5 to make sure the completeness level remains reas onable. There are other, more 
soph isticated methods to estimate the completeness magnitude (see, e.g 



Schorlemmer and Woessner 



20081 ] and references therein), which we hope to use in the future. 

Since we are interested in estimating the rate of earthquakes m > m m i„, we corrected the observed 
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rate for missing events by applying a Gutenberg- Richter scaling Gutenber g and Richter] , 11944] with b — 1 
(see section |4~T1 for the b- value estimation) 



//(f) = n(f) • I0 mo «- m ""™ (5) 

3.4 Optimizing the Spatial Smoothing 

We estimated the parameter n v , the number of neighbors used to compute the smoothing distance di 
in equation (|3]), by maximizing the likelihood of the model. We built the model fi'(i x ,i y ) in each cell 
(ix, iy) on a training period of the catalog and tested it (evaluated the likelihood) on a separate testing 
period of the catalog. Our model assumes independence of magnitudes and locations. We therefore 
solely tested the spatial distribution of a model forecast, neglecting magnitudes. Moreover, we assumed 
that spontaneous background seismicity is time-independent, so that the expected number of events is 
simply a pre-factor to a normalized spatial density. Therefore, we optimized solely the normalized spatial 
density estimate in each cell (i x ,i y ) using 

*/• -x Lt'(ix,i y )N t 

M( ^ ) = E~E^(^) (6) 

where Nt is the number of observed target events. The expected number of events for the model fi* thus 
equals the observed number Nt to optimize solely the spatial forecast. 

We assumed that the observed earthquakes in each cell are Poisson random variables, independent of 
each other in space and time. We revisit this assumption in sections 14.41 and 14.51 The log-likelihood of 
the model is thus given by 

LL = 53 53 log p[M* (is (7) 

where n is the number of events that occurred in cell (i x , i y ), and the probability p of observing n events 
in cell (ix,i y ) given a forecast of (j,*(i x ,i y ) in that cell is given by the Poisson distribution 

p[fl (lx,ly),n] = [/J (lx,ly)] • (8) 

We built an extensive set of background models fi* by varying (i) the spatial smoothing kernel, either 
Gaussian or power-law, (ii) the input /training catalog, and (iii) the target catalog. The training and the 
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target catalog were chosen so that they do not overlap (except in certain cases discussed below). We 
generated forecasts based on the training catalog and evaluated them on the separate target catalog to 
test our forecasts out-of-sample and to avoid over-fitting. We evaluated the performance of each model 
by calculating its probability gain per earthquake relative to a model with a uniform spatial density: 

. — y / LL LL un if \ . . 

= 6XP { N t J (9) 

where LL un if is the log-likelihood of the uniform model and N t is the number of observed events in the 
duration t of the target catalog. 



3.5 Results of the Spatial Smoothing Optimization 

We smoothed the declustered catalog according to the procedure described above and as previously 



perfor med by 



Helmstetter et al 



Helmstetter et al 



2007]. Apart from applying more (updated) data than 
20071 ] and using a modified procedure for estimating the completeness magnitude, we also corrected the 
code for artificial boundary effects and rounding errors which led to some events being assigned to the 
wrong cells. In the following sections, we highlight different aspects of the results of the optimization, 
which are summarized in Table [T] 

Table[T]shows the probability gain of the smoothed seismicity forecasts as a function of the magnitude 
threshold m m i„ of the learning catalog. The probability gain increases slightly from m m i n = 2 (G = 5.13) 
until mmin = 3 (G = 6.73) before decreasing rapidly beyond m m in = 4. Therefore, small m > 2 
earthquakes help predict the locations of future large m > 5 shocks. 

While small earthquakes continue to help make better forecasts, the precise values of the gain fluctuate 
for different target periods. Smaller gains indicate that some target events are surprises, occurring in 
locations of little previous seismicity. Such events may be better forecast by a longer catalog. For the 
1989 to 1993 target period, the poor likelihood score was due to three earthquakes that occurred near 
(42.3./V, 122. 0-B). There, the uniform forecast (fi un if = 1.3 • 10~ 4 expected quakes per 5 years per cell) 
outperformed the smoothed seismicity forecast (fj, ~ 3 • 10 -6 expected quakes per 5 years per cell) . Thus, 
large, surprising events may occur in regions that saw no activity in over a decade, not even small m > 2 
events. 

Interestingly, the gain does not monotonically decrease with shorter learning period, i.e. statistical 
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fluctuations influence the results. Nevertheless, including small earthquakes to generate forecasts for 
large m > 5 earthquakes improves estimates of earthquake potential over estimates based solely on large 
events. This suggests that large earthquakes occur on average in the same locations as small earthquakes. 

We tested two spatial smoothing kernels: the Gaussian and the power-law kernel. Table Q] shows 
that the Ga ussian kernel performs m arginally better than the power-law kernel, in contrast to the results 



obtained by 



Helmstetter et al. 



20071 ] . However, different target periods can show reversed results. Given 
that the gain varies between different target periods, the differences between the Gaussian and the power- 
law kernel may not be significant. We preferred the power-law kernel because seismicity occurs on fractal 
networks and because the optimal smoothing parameter n v was more robust. 

In Table [T] we tested the influence of varying the magnitude threshold of the learning catalog. In 
Supplement 2, we show that the gain is roughly constant for models in which only the magnitude threshold 
of the target catalog is increased from 2 to 5.5, suggesting that large earthquakes on average occur in 
the same locations as small earthquakes. Our method works well for any target magnitudes. 

Our favorite spatial background for a new five-year forecast is given by model 21, which uses the entire 
available catalog. Compared to model 20, where n v = 1, model 21 had slightly smaller gain. However, 
the optimal smoothing parameter n v varies from 1 (model 3) to 9 (model 19) for different 5-year target 
periods. The average optimal n v over the 4 different target periods equals 3.75, which rounds to 4. 
However, a slightly smoother model is preferable since we only used a short catalog since 1981, while 
maintaining a comparable probability gain. Model 21 with n v = 6 fulfilled these requirements. In the 
future, we'd like to optimize the smoothing parameter n v over different target periods. 

In Supplement 3, we repeated the spatial forecast optimization on a declustered target catalog, in 
order to generate a second long-term forecast targeting the ma inshock-only forecast group of CSEP 



Schorlemmer et al 



2007 



Schorlemmer and Gerstenberaer 



2007 1- 



We performed further tests to investigate the effects of different input and target catalogs. Increasing 
the collection region from the CSEP collection region to a much larger rectangle around California slightly 
increases the performance of the smoothed seismicity forecasts. A spatial forecast produced by smoothing 
the entire non-declustered, original training catalog performed significantly worse than the declustered 
smoothed seismicity. Without accounting for the temporal Omori-Utsu decay of clustered events, the 
long-term forecast is dominated by short-term clustering and too localized. 
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4 Time-Independent 5- Year m > 4.95 Forecast Based on 
Smoothed Seismicity 
4.1 Magnitude Distribution 

We assumed that the cumulative magnitude probability distribution obe ys a tapered Gutenberg-R ichter 



distribution with a uniform b-value and corner magnitude m c (Eq. 10 in 



Helmstetter et al 



20071 1) 



P(M > m) 



10 



-b{m — m rn i ri ) 



exp 



[lO 1 



5(rn rn ^ 7l —m c ) 



10 



1.5(m — m c ) 



(10) 



with a minimum magnitude m m j n = 4.95 (for the five -year CSEP forecast group) and a corner magni- 
tude m c = 8.0 as suggested by 



Bird and Kaaan 



estimated the b-value using maximum likelihood 



2004 



Aki 



for continental transform fault boundaries. We 



1965( | and found b ~ 1 for m > 2. 



In the geothermally active Geysers region in Northern California, the distribution of the magnitudes 
deviates from the standard Gutenberg-Richter distribution with 6 equal to one (Figure [3}: a break 
occurs around in ~ 3.3. Below the break, the b-value is b w 1, while above it is b ~ 1.75. To forecast 
the expected number of earthquakes in each magnitude bin of size Am = 0.1, we used this modified 
magnitude distribution for the Geysers area (for —122.9° <lon< —122.7° and 38.7° <lat< 38.9°). Other 
geothermal regions in California seem to behave more regularly, indicating that geothermal activit y is 



not the sole cause of the anomalous magnitude distribution at the Geysers 



Helmstetter et al 



2007] 



4.2 Expected Number of Events 

To estimate the expected number of earthquakes, we counted the number of m > 4.95 events in the time 
period from 1 January 1981 until 1 April 2009 and divided by the time period of the catalog in years to 
obtain N pre d = 6.71 earthquakes per year in the CSEP testing region. The expected number of events 
per year in each space-magnitude bin (i x ,i y ,im) was then calculated from 

E (i x ,i y ,im) = N pred n(i x ,i y ) P(i m ) (11) 

where fj, is the normalized spatial background density, and P(i m ) is the integrated probability of an 
earthquake in magnitude bin (i m ) defined according to equation (|10|) and taking into account the different 
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2 3 4 5 6 7 

magnitude 



Figure 3: Magnitude distribution of earthquakes m > 2 from 1 January 1981 to 1 April 2009 in the Geysers 
region (blue circles) and in the remainder of California (red squares). Near the geothermally active 
Geysers, the distribution differs from the usual Gutenberg- Richter distribution with b- value equal to 
one. 



b- value for the Geysers. 

Figure [4] shows our new five-year forecast for m > 4.95 for 2010 to 2015 based on optimally and 
adaptively smoothing the locations of declustered, small m > 2 earthquakes. On average, we expect 
33.55 earthquakes m > 4.95 over the next five years in the entire test region. The modified magnitude 
distribution near the Geysers helps us avoid an unrealistically high rate of large quakes there. 



4.3 Comparison of Our New Five- Year Forecast to 



Helmstetter et al 



20071 | 's Forecast 



In gen erating our new five-year earthquake forecast, we used data up to 1 April 2009. 



Helmstetter et al. 



2007 1 used data up to A ugust 2005. We expect 33.55 earthquakes m > 4.95 over the next five years, while 



Helmstetter et al. 



20071 ] expected 35.40 earthquakes m > 4.95 over the five-y ear period from January 



Helmstetter et al 



2007] 



2006 until December 2010. Another difference between our forecast and that of 
is that we fixed two minor bugs in the smoothing algorithm and modified the procedure for estimating 
the completeness magnitude. To compare solely the spatial distribution of the forecasts, we normalized 
each forecast to an overall rate equal to 1. Figure \5\ shows the ratio of the normalized new forecast over 



the normalized forecast by 



Helmstetter et al. 



2007 ] 
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Figure 4: Expected number of earthquakes m > 4.95 per 0.1° by 0.1° cell per five years from 2010 to 2015, based 
on model 21 in Table [T] We used an optimized, adaptive power- law kernel to smooth the locations of 
dcclustered m > 2 earthquakes from 1 January 1981 until 1 April 2009. Also shown as black squares 
are m > 4.95 quakes that occurred between 2003 and 2008. 
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Figure 5: Ratio of our new spatial five-year forecast for m > 4.95 over the spatial forecast bv\Helmstett er~et al\ 
I20071I . each normalized to 1. Also shown by black squares are the 12 earthquakes m > 4.95 that 
occurred from 1 January 2006 until 1 July 2008, the half-time mark of the five- year prospective Regional 
Earthquake Li kelihood Models (RELM) experiment, in which the model by \Helmstetter et all [200711 
is leading (see [Schorlemm er et all |2009|| ) . Five of the 12 events are located in Baja and overlap. 



Our new forecast is not very different from the old forecast in much of California (see the yellow 
regions in Figure [5]). The differences (up to a factor of 10) in some localized regions are mostly due to 
recent earthquakes since August 2005 that increased the new forecast. However, there are also more 
unexpected differences. Firstly, the narrow stripes of increases or decreases along the boundaries were 
artifacts in the old forecast, which we fixed by correctly smoothing the completeness magnitude. Secondly, 
some regions in the South, South-East and in the North show that the old forecast expected about 30 
times as many earthquakes as our new forecast. These differences are partially due to the corrected 
smoothing and partially due to the updated data set. The ANSS catalog may also have changed in the 
meantime, making forecast calculations somewhat irreproducible (see section [5). We stored the data at 
mercalli . ethz . ch/~mwerner/W0RK/Calif orniaForecasts/CODE/ 



Helmstetter et al. 



20071 ] estimated 



Another region showing a factor 10 difference is the Geysers area 
b ~ 1.99 in this region, while we estimated b ~ 1.75. We also used a slightly different technique to 
extrapolate the obs erved m > 2 rate to even ts m > 4.95 
The forecast by 



Helmstetter et al 



2007| was submitted to the five-year RELM experiment 



Field 
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20071 ]. \Schorlemmer et al\ [2009I | evaluated the submitted models after the first two and a half years of 
the five-year period. 12 earthquakes m > 4.95 had be en observed (see the bla ck squares in Figure [5] 



five of which overlap in Baja California). The model by 



Helmstetter et al 



2007] outperformed all other 



models. Our forecast, w hich was partially built on the same data set used to evaluate the forecast by 



Helmstetter et al. 



2007], would have outperformed the old forecast by a gain of G — 1.18 per event. 



How would our forecast compare if it were built on the same training data? We created a forecast 



based on the same data that 



Helmstetter et al 



20071 ] used, up to August 23, 2005. We compared both the 



goodness of fits and the performance during the first half of the RELM experime nt. To compare goodnes s 



of fit, our model 22 in Table [T] should be compared with model 21 in Table 1 of Helmstetter et al 
Our optimal smoothing parameter was n v — 1, while that of Helmstetter et al 



2007 ] 



20071 ] was n v = 6; the 



numbers of earthquakes var y by several dozen (see s ection [H; the gain we found for n v = 6 is G = 6.85 



below the G = 7.08 found by 



Helmstetter et al 



20071 ] . but for n v — 1, we find the same value of G — 7.08. 



Thus, despite the minor bugs we identified and fixed, the goodness of fit is essentially the same. 

To compare performance in the RELM experiment, we needed to estimate the magnitude distribution 
and the number of expected events. We esti mated b ~ 1 for Californi a. For the Geysers region, we found 
b w 1.88, closer to the estimate b w 1.99 by 



Helmstetter et al. 



20071 ] than our previous one of b « 1.75, 



for which we used all data up to April 2009. For the first half of the five-year RELM experiment period, 
the new forecast performed worse than the previous one (G = 0.83 per earthquake). The differences are 
due to three events near the Mendocino Triple Junction. However, the differences are small and future 
earthquakes can change the probability gain significantly. 



4.4 Retrospective Consistency Tests of the 5-Year Forecast 

Thus far we optimized our model by maximizing its likelihood in retrospective forecasts, but we did 
not test whether our forec ast is consistent with past data. To do t hat, we used the consistency tests of 



the on going RELM project 



Schorlemmer et al 



2007 



20091 ] (see also 



Kaaan and Jackson 



19941 ]; 



Jackson 



199y|). We performed two different tests: one to test whether the expected number of events is consistent 
with the observed number of events (the N test), and another one to test whether the model's simulated 
likelihood values are consistent with the observed likelihood value (the L-test). 

Time-independent forecasts are usually specified as a Poisson rate, i.e. the probability distribution of 
the number of events under the model is given by a Poisson distribution with mean equal to the expected 
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(mean) number of events (N pre d = 33.55, in our case). Using this distribution, the N-test calculates 
the probability S that, according to the model, the actually observed number or a smaller number is 
observed. If 8 is very low (less than 0.025) or very large (larger than 0.975), we reject the model with 
95% confidence because, under the model, the observed number of events is highly unlikely. For the 
5-year target period 2004-2008, there were N a i, s = 25 events, so that S = 0.08 and the model could not 
be rejected. 

To test whether the model is consistent with the observed likelihood value, we resorted to simulations. 
In each space- magnitude bin, we simulated the number of observed events according to the model forecast 
in that bin, again using the Poisson distribution. We then calculated the likelihood value of this particular 
simulation. We performed 10, 000 simulations, from which we obtained a distribution of likelihood values. 
The L test compares the observed log-likelihood value with these simulated log-likelihood values by 
calculating the probability 7 that the observed log-likelihood value or a smaller value was simulated. If 
the fraction 7 is extremely low (below 0.025), then we reject the model, because, under the model, the 
observed log-likelihood score is highly unlikely. However, we do not reject the model if 7 is very large, 
since the model might be too smo oth, so that the likelihood of the data might be much higher than 
expected under the model (see also 



Schorlemmer et al. 



20071 ]'). For the 5- year target period from 2004 



to 2008 inclusive, the observed likelihood value was LL b s = —172.8, which resulted in 7 = 0.975, i.e. 
the model was not rejected. 

We repeated the N and L tests of our forecast using a total of 24 overlapping 5-year target periods, 
the first of which started in 1981, the second in 1982, etc., until the last target period from 2004-2008. 
Figure [S] shows the 5 and 7 values as a function of starting year of each of these 24 target periods. 
We also show for reference the number of observed events (red circles) in each target period, and our 
expected number of events N pre d = 33.55 (solid red line) and the 95% confidence bounds of the Poisson 
distribution (red dashed lines). 

The model was rejected by the N test in several target periods: in those that contain the 1992 Mw7-3 
Landers earthquake (too many observed events) and in a few that happen during the relatively quiet 
period 1999-2003 (too few observed events, despite the 1999 Mw7.1 Hector Mine sequence). The 7 
statistic also rejected the model in the target periods that contain the 1992 Mw7-3 Landers earthquake. 
Since the L test is defined to be one-sided, we could not reject the model for 7 > 0.975 during the quiet 
phase in the first years of the new millennium (but the N test did reject the model). The anti-correlations 
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1990 1995 2000 

Start of 5Yr Target Period 

Figure 6: Retrospective consistency tests of our 5-year time-independent forecast on past, overlapping 5-year 
periods with starting year as abscissa. Left (blue) ordinate: The N-test statistic S (blue squares), 
measuring the agreement between the observed and the predicted total number of events, and the 
L-test statistic 7 (blue circles), measuring the agreement between the observed and the expected 
likelihood score. Right (red) ordinate: Number of observed events in each target period (red circles), 
the expected number of events (red solid line) and 95% Poisson confidence bounds (red dashed lines). 
Grey bars denote N-test rejection regions; the lower grey bar marks the L-test rejection region. 



between the two statistics were very strong and indicate that, in this case, the L test is dominated by 
the number of observed events. To make the test more sensitive to the space and magnitude forecast of 
the model, we propose to modify the L-test in two ways: To nor malize by the numbe r of events, and to 



Zechar et al 



2009b| also proposed the 



test only the spatial component of the forecast (see section I4~5[l 
latter spatial (S) test and we follow their notation. 

Figure [6] also shows that the assumption of Poissonian variability in the number of observed events 
is wrong: The variance of the number of observed events per 5 year period is larger than expected 
from the Poissonian model. The model is rejected ten times in 24 overlapping periods. Moreover, since 
the number of observed events sometimes exceeded the maximum allowed by our Poisson forecast, and 
sometimes dropped below the minimum of our forecast, we could not construct a Poisson forecast that 
would be accepted by the N test for every target period (or even for 95% of the target periods). Thus 
even for 5-year periods, a Poisson forecast is inappropriate when attempting to forecast all earthquakes 
(Supplement 3 demonstrates that the Poisson assumption seems justified for ill-defined mainshock-only 
forecasts). The next section presents a forecast with a non- Poissonian distribution of the number of 
events. 
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4.5 Negative Binomial Earthquake Number Distribution and Modified 
Likelihood Tests 

We analyzed the distribution of the number of observed events in 5-year, non-overlapping intervals in the 
ANSS catalog of magnitudes m > 4.95 from 1932-2008 within the CSEP testing region. We compared the 
fit of the Poisson distribution to the empirical distribution with the fit of a negative binomial distribution 
(NBD). The discrete negative binomial probability mass function is defined by: 



p(k\r, v) = ^ ~ ")* 



where k = 0,1,2,..., T is the gamma function, < v < 1, and r > 0. There are 
distributions, but the NBD is simple, fits well and has been used before 



(12) 



Jackson and Kaaan 



> 0. There are i 


nany discrete 


Vere-Jones 


1970; 


Kaaan, 


1973 



19991 ], The Akaike Information Criterion (AIC) favors the NBD (AIC = 130.4) over 
the Poisson distribution (AIC = 246.1). Using maximum likelihood, we obtained parameter estimates 
of the NBD ([r w 2.76, v w 0.08]) and of the Poisson distribution (A » 29.9). 

We created a NBD forecast, different from our Poisson forecast solely in the distribution of the number 
of events, summed over all space and magnitude bins (rates in individual bins remained the same). The 
two parameters of our NBD forecast can be calculated from the mean and variance of the distribution. 
For the mean, we preferred our earlier estimate (from section 14. 2p of the expected number of events, 
Np re d = 33.55, based on the more recent, higher quality ANSS data from 1981 to 2009. To estimate the 
variance Var(N i, s ) « 368.1, however, we used the longer data set from 1932. From these values, we 
obtained the NBD parameters ([r « 3.37, v ~ 0.09]). 

We repeated the retrospective N test on our new NBD forecast, the results of which are shown in 
Figure [7] Compared to Figure [6] our new NBD forecast cannot be rejected for any target period, since 
the number of observed events is always well within the 95% confidence bounds of our NBD distribution. 

We mentioned above that the 7 statistic conveyed little additional information over the 8 statistic, 
because it depends too strongly on whether the observed number of events were within the 95% confidence 



bound s of the number distribution. We therefore modified the L-test proposed by 



Schorlemmer et al 



20071 ] in two ways. For simulated likelihood scores, we used the same total number of events as were 



observed, co nditioning the simulated l ikelihood scores on the observed number of events. In contrast, 



the L-test by 



Schorlemmer et al 



20071 ] simulated a random number of simulated events drawn from the 
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Figure 7: Improved retrospective consistency tests of the 5-year negative-binomial forecast on past, overlapping 
5-year periods with starting year as abscissa. Left (blue) ordinate: The negative binomial N-test 
statistic <5 (solid blue), measuring the agreement between the predicted and the observed number of 
total events; the spatial likelihood test statistic £ (dashed blue), measuring the agreement between 
the expected and the observed spatial component of the likelihood, and the modified likelihood test 
statistic 7' (dotted blue), measuring the agreement between the expected and the observed likelihood 
scores when normalized to the number of observed events. Right (red) ordinate: Number of observed 
events in each target period (red circles), along with the forecast's expected number of events (red 
solid line) and 95% negative binomial confidence bounds (red dashed lines). 



Poisson distribution with mean equal to the total expected number of events. Thus each simulation in 
the original L-test has (potentially) a different number of events. This number is fixed in our modified 
L-test. We denote the probability that the observed likelihood value or less were simulated, conditional 
on the number of observed events, by 7'. The modified L-test tests the space and magnitude component 
of the forecast, but not the number of observed events. Results of the retrospective tests are shown in 
Figure [7] (blue dotted curve). 

To solely test the spatial component of the forecast, we modified the L-test again by summing the 
forecast over all magnitude bins to eliminate the magnitude dimension. We then simula ted likelihood 
values from this space-only forecast, conditioned on the number of observed events (see also 



Zecharetal 



2009b |). We denote the proba bility of simulating t he observed spatial component of the likelihood score 



or a smaller value by £, using 



Zechar et al 



2009bl ]'s notation. The results of this S-test are also shown 



in Figure [7] (blue dashed curve) . The model cannot be rejected by these new consistency tests, but we 
should expect this result since we used the same data set to build the model. 

In this section, we dropped the Poisson distribution in favor of the NBD to better forecast the number 
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of earthquakes. This innocent, minor modification has major implications: (i) The sole time-independent 
stochastic point process is the Poisson process, and therefore any distribution but the Poisson distribution 
necessarily implies a time-dependent process. By acknowledging the NBD's better fit, we negate time- 
independent forecasting, (ii) We used a NBD for the total number of events, but in each individual bin 
remains specified as Poissonian. But summing the Poissonian rates over all bins cannot result in a NBD. 

How can we justify these inconsistencies? The short answer is: Because it is a quick and simple 
solution to a much deeper problem; An approximate solution that comes at the cost of a slight theoretical 
inconsistency. Moreover, the modified L-test we proposed no longer tests the overall number of observed 
events, and is thus to some extent decoupled from the NBD distribution. This is the approach we take in 
this article. However, the long answer is: We cannot justify it, and we must abandon time-independent 
earthquake forecasts. The Poisson process is an approximation that leads to great simplification, but 
the approximation is starting to catch up with us. We need time- and history-dependent process with 
memory, even for long-term forecasts (unless one forecasts so-called mainshocks (see Supplement 3), 
which can only be defined subjectively and retrospectively). Our ETAS branching model (section [S} is 
such a process, and in the future, we'd like to use it for long-term forecasts, too. 



5 Time-Independent Next-Day m > 3.95 Forecasts Based 
on Smoothed Seismicity 

To compare our prospective next-day forecasts of the ETAS model, described next in section [6] to 
a simple yet informative and prospective reference model, we produced a time-independent next-day 
Poisson forecast based solely on smoothed seismicity. The forecast is identical to the five-year m > 4.95 
forecast, except that we used m m in — 3.95 as the minimum target magnitude and we calculated the 
expected number of events N pre d per day for m > 3.95. There were N pre d = 0.177 earthquakes m > 3.95 
per day in the period from 1 January 1981 until 1 April 2009. We do not expect this forecast to perform 
well, since pulses of triggered earthquakes violate the time- independent nature of this forecast. 
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6 The Epidemic- Type Aftershock Sequences (ETAS) Model 



6.1 Definition of the ETAS Model 



Ogata 



1988 



1998] is a stochastic spatio- 



The Epidemic- Type Aftershock Sequence (ETAS) model 
temporal branching point-process model of seismicity. The model embodies the notion that every earth- 
quake may trigger other earthquakes according to empirical probability distributions in time, space and 
magnitude. The triggering rate increases exponentially 10 am with magnitude m of the parent event and 
decays in time according to the Omori-Utsu law and in space according to a spatial decay function, e.g. a 
Gaussian or a power-law kernel. Each magnitude is identically and independently distributed according 
to the tapered Gutenberg-Richter distribution, independent of past seismicity. So-called aftershocks may 
thus be larger than the initiating shock. The ETAS model can model entire earthquake catalogs, rather 
than just aftershock sequences, because the total rate is determined by the sum of a time-independent 
background rate and the resulting cascades of triggered events. 

There are numerous flavors wi t hin the branching process family ( see, e.g. 



Kaqan 



20041 1; 



19911; 



Felzer et al 



Hainzl and Ogata 



2002] 



Helmstetter and Sornette 



2002] 



Kaqan and Knovoff 



Console et al 



2005[). We used the particular formulation of 



20oi: 



Helmstetter et al 



19871 



Zhuang et al. 



20061 ]. albeit 



with some (minor) modifications. The total seismicity rate X(t, f, m) at time t, location f and for 
magnitude m is the sum of a background rate fAb(r), a spatially heterogeneous time-independent Poisson 
process, and the sum of individual contributions to the triggering potential from all prior earthquakes 
occurring at times U < t with magnitudes mj 



X(t, r,m) = Pm(m) 



fib 



(r) + 22 m , (r — fi,t - U) 



(13) 



where P m (m) is the space- and time-independent tapered Gutenberg-Richter distribution of magnitudes 
(|10|) . The triggering function <f) mi (r,t) describes the spatio-temporal triggering potential at a distance r 
and time t after an earthquake of magnitude m 



4>m(r,t) = p(m)tjj(t)f(r,m), 



(14) 
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where p(m) is the average number of earthquakes triggered by a quake of magnitude m > md 



p(m) = fclO 



a(m-m d ) 



(15) 



the function ip(t) is the Omori-Utsu law normalized to 1 

(p - iK" 1 



(t + c)p 



(16) 



and f(r, m) is a normalized spatial aftershock density at a distance r to the parent shock of magnitude 
m. The constant mj refers to a chosen threshold magnitude above which the model parameters are 
estimated, which may, in general, be different from the completeness magnitude mo and the target 
magnitude threshold m m i„. We tested both a Gaussian and a power-law kernel, as d escribed in section 
6.41 We fixed the c- value in Omori's law (|16[) to 0.035 days (5 minutes). According to 



Helmstetter et al 



2006l | . this parameter is not important as long as it is much smaller than the time window of the forecast 
(1 day). But sometimes, e.g. for the 1992 Landers earthqua ke, c is greater than 1 day. Section 



discusses this problem in more detail, along with a solution by 



Helmstetter et al 



2006] 



We used the same tapered Gutenberg-Richt er distribution (|10p as for the long-term forecast, with a 



b-value of 1.0 and corner magnitude m c = 8.0 



Bird and Kaaan 



2004]. Wc used earthquakes m > 2 to 



forecast events m m i„ > 3.95 for the CSEP experiment (Supplement 5 presents parameter estimates for 
target threshold m m i n — 2). Since the earthquake catalog is not complete down to md = 2 after large 
earthquakes, we corrected for this effect (see section I6.3|l , as well as for any time-independent spatial 
incompleteness. 

The background rate fi b is given by 

Hb(f) = HsPo{r) (17) 

where po{r) is equal to our favorite spatial, time-independent model normalized to one (model 21 in 
Table [TJ, so that the parameter fj, s represents the expected number of m > m m i n background events per 
day. Since we used the same spatial model for the time-independent m > 3.95 forecasts (section [SJ, we 
can evaluate directly the improvement of the ETAS forecasts over a static model. 
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6.2 ETAS Forecasts for 1-Day Bins 

The ETAS model is defi ned via its conditional i ntensi ty or hazard rate (|13[) . which is the instantaneous 



probability of an event 



Daley and Vere-Jones 



20041 ] . To forecast the number of earthquakes over a 



finite period, one cannot directly use the conditional intensity, since intervening quakes may change 
the rate. One solution to forecast over a 1-day period is to simulate earthquakes according to the 
conditional intensity at the beginning of the 1-day bin, and then to obtain a mean forecast by averaging 
over all simulations. Such simulations are computationally intensive, and it is not clear whether the 
10, 000 simulations commonly used to produce spatial-temporal forecasts are adequately sampling the 
possibilities. 

A simpler solution is the following. 



Helmstetter and Sornette 



2003] showed that, for synthetic ETAS 



catalogs, the use of N p = J^ p+T X(t)dt to predict the number of quakes between t p and t p + T under- 
estimates the number of actually occurred earthquakes by an approximate constant factor, independent 
of the number of future events. We can use the ETAS model (|13jl to forecast the number of events of 
the next day, but wit h effective parame t ers k, a and fi s , which are different from the original parameters 
of the ETAS model 



Helmstetter et al 



2006]. Instead of using the likelihood of the ETAS model, we 
estimated parameters by maximizing the likelihood of the next-day forecasts using a Poisson likelihood 
defined in section 16.51 These effective parameters depend on the forecast horizon and are difficult to 



compare to the original ETAS parameters. 



6.3 Correcting for Spatial and Temporal Magnitude Incompleteness 

The ANSS catalog is not complete down to magnitude m — 2 everywhere in California. To correct for the 
spatial incompleteness, we used the same completeness magnitude estimate as for our time-independent 
forecast, as described in section [3731 and shown in Figure [2] However, for our time-depen dent forecasts 
the temporary incompleteness of catalogs after large earthquake s also becomes important 



Helmstetter et al 



2006 



Peng et al 



2007 



Lennartz et al 



Kaaan 



2004 



2008| showed that the completeness magnitude 



after large earthquakes can temporarily increase by several magnitudes. One effect is that the forecast 
overestimates the observed rate because of missing events. Another effect is that secondary triggering is 
underestima ted because early undete cted aftershocks contribute to the rate. We followed the solutions 



proposed by 



Helmstetter et al 



2006] to correct both effects. 
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We estimated the completeness magnitude mo(t,m) as a function of time t (in days) after an event 
of magnitude m using: 

mo(t, m) = m - 4.5 - 0.75 log 10 (i) (18) 

and 

m {t,m)>2 (19) 



Figure 6 of 



Helmstetter et al 



2006] illustrates this relation for sequences triggered by the 1992 Mw7-3 
Landers, the 1994 Mw6-7 Northridge and the 1999 Mw7.1 Hector Mine earthquakes. 

We used expression (|18[1 to estimate the detection threshold at the time of each earthquake. This 
time-dependent threshold is usually larger than the regular spatially-varying threshold for earthquakes 
that occur shortly after large m > 5 earthquakes. We selected only earthquakes with m > mo to estimate 
the seismicity rate (|13[1 and to calculate the likelihood of the forecasts. 

We also corrected the forecasts for the second effect, the missing contribution from undetected after- 
shocks ijid < m < mo(t) to the observed seismicity rate: We added a contribution to the rate p(m) due 
to detected earthquakes m > mo(t) ()15[) 

p*(m) = p(m) + _^_io i ' (m °( t) - m <*> [l - 10 -(t— )(-o(t)-m d )j _ (20) 

where mo(t) is the detection threshold at the time t of the earthquake, estimated by (|18[) . due to the 
effect of all prior m > 5 earthquakes. The second contribution corresponds to the effect of all earthquakes 
with md < m < mo(t) that occur on average for each detected earthquake. This contribution is of the 
same order as the contribution from the observed events for reasonable parameter values, because a large 
fraction of aftershocks are secondary aftershocks and because small earthquakes collectively trigger an 
equal amount of aftershocks as larger ones if a = b. 



6.4 Modeling the Spatial Distribution of Aftershocks 

We tested different choices for the spatial kernel KM m \ (f, m), which models the aftershock density at a 
distance r from a parent shock of magnitude m. As in section 13.21 we compared a power-law function 
|T]) with a Gaussian kernel ([2]). The spatial regularization distance d(m), which replaces the adaptive 
bandwidth di in the kernels {1} and ([2]), accounts for the finite rupture size and for location errors. We 
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assumed that d(m) is given by 



d{m) = 0.5 + f d ■ 0.01 ■ 10° 



l km, 



(21) 



where the first term on the right-hand-side accounts for location accuracy and the second term represents 
the aftershock zone length of an earthquake of magnitude m. The parameter fd is estimated by optimizing 
the forecasts and, in the case of the Gaussian kernel ([2]), should be close to one if the aftershock zone 
size is isotropic and 67% (one standard deviation) of trigger ed events fall within one ruptur e length. The 



rupture length estimate (|2ip is similar to the expression by 



Wells and Coppersmith 



1994] 



T he choice of the exponent 1.5 in the pow er-law kernel (JT| is motivated by recent studies 



2004 



Console et 



2003 



Zhuana et al 



Ogata 



20041 ]. who inverted this parameter in earthquake catalogs by 
maximizing the likelihood of the ETAS model, and who all found an exponent close to 1.5. This value 
also conveniently translates into an analytically integrable form. It predicts that the aftershock density 
decays with distance r from the parent shock as 1/r 3 in the far field, proportional to the static stress 
change. 

Large m > 5.5 earthquakes with a rupture length larger than the grid cell size of 0.1° are rarely 
followed by isotropic aftershock di stributions. We therefore used a more complex , ani sotropic kernel for 



these events, as done previously by 



Wiemer and Katsumata 



1999] 



Wiemer 



20001 ] and 



Helmstetter et al. 



20061 ]. We smoothed the locations of early, nearby aftershocks to estimate the main shock fault plane 
and other active faults in the immediate vicinity. We computed the distribution of later aftershocks of 
large m > 5.5 quakes by smoothing the locations of early aftershocks using 



K d (r,m) = J~) K d (\r- f,|,m,), 



(22) 



where the sum is over the main shock and all earthquakes that occur within a distance D a f t (m) before 
the issue time t p of the forecast and not after some time T a ft after the main shock. We usually took 
Daft = 0.02 x I0 0,5m km (approximately two rupture lengths) and T a ft = 2 days but tested other values 
in section l6~6l The kernel Kd{r, m) used to smooth the locations of early aftershocks is either a power- law 
([T]) or a Gaussian distribution ([2]), with an aft ershock zone length given by (|21[) for the main shocks, but 



by d — 2 km for the aftershocks. Figure 7 of 



Helmstetter et al 



2006] compares the Gaussian and the 



power-law kernel on the 1992 Mw7.3 Landers earthquake sequence. 
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Smoothing the locations of early aftershocks to forecast the spatial distribution of later aftershocks is a 
fast and completely automatic method to estimate the main shock rupture plane along which aftershocks 
tend to cluster. In section [rJUl we tested several values of the method's parameters. 



6.5 Definition of the Likelihood and Estimation of the ETAS Model 
Parameters 

We estimated the parameters of the ETAS model by maximizing the likelihood of the data. We inverted 
for five parameters: p (the exponent in Omori's law (|16|l 'l. k and a (characterizing the productivity ()15l) ). 
/j,s (the number of background events per day (JXTJ) ) , and fd (the size of the aftershock zone (1211) ). 

As already stated, we did not maximize the likelihood function of the ETAS model. Rather, we 
estimated effective parameters by maximizing the cumulative likelihood of the next-day forecasts using 
a Poisson distribution @. The log-likelihood of the forecasts is the sum of log-likelihoods in each space- 
time-magnitude bin indexed by (it,ix,i y ,im)'- 



N t JV X N v N m 

ll = ^2 ^2 ^2 l °gp( A p( i *>^> i !/> i "0> 7 ' 1 ) 



(23) 



where n is the number of observed events in the bin (it 



, tl, by, t, m 



and the probability p(-,n) is a Poisson 



distribution with a rate given by the expected number of events N p (it,i x ,i y ,im) (the forecast), which in 
turn is the integral over each space-time-magnitude bin of the predicted seismicity rate A(r, t, m) 



N p (it,ix,iy,im) = / / / / A(F,t, m) dm dy dx dt. (24) 

J it J i x J iy J i m 

The rules of the CSEP one-day forecast group dete rmine the bin size: 1 day in time, 0.1 deg rees longitude 



and latitude in space, and 0.1 units of magnitude 



Schorlemmer and Gerstenberaer 



2007] 



The log-likelihood (|23[) can be simplified by substituting the Poisson distribution ((SJ and noting that 
the sum over the space-magnitude bins need only be carried out over those bins in which earthquakes 
occurred 



-N p {i t )+^ ^2 n lo S [^p(*t, im)] -log(r 



— l im — 1 



(25) 
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where N p (it) is the total expected number of events in the time bin it between t(it) and t(it + T) 

N p (i t ) = fJ L s + fiP^i^itp ~ *<) dt - ( 26 ) 

Jit t«<t(n) 

The factor /; in (|26p is the integral of the spatial kernel fi(r — fi) over the grid, which is smaller than 1 
due to the finite grid size. 



1992], p. 402) and using 



We maximized the log-likelihood ([23} using a simplex algorithm (| Press et al. 
earthquakes above various m m i n from 1 January 1986 until 1 April 2009 in the CSEP testing region to 
test the forecasts. To calculate the seismicity rate (|13[) , however, we took into account earthquakes m > 2 
since 1 January 1981 that occurred within CSEP collection region. We tested the effect of variations of 
the spatial aftershock kernel, and different target magnitude thresholds. 

To quantify the performance of the short-term forecasts with respect to the time- independent forecast, 
we used, as before, the probability gain per earthquake © of the ETAS model likelihood with respect to 
the time-independent likelihood. The time-independent model is our favorite background density /z(r) 
(model 21 in Table[T]), which also serves as the background model of the ETAS model, but we normalized 
[i(r) so that the total number of expected target events equalled the observed number. 

Since the background model was built from the same data that the time-dependent model was tested 
on, the probability gain solely measures the relative increase of the spatio-temporal aspect of the ETAS 
forecast over the time-independent model. In a truly prospective mode, the actual likelihood values may 
be smaller, although the improvement over a time-independent forecast may remain high. 

By maximizing the likelihood of the Poisson forecasts, we assumed that the Poisson distribution in 
each space-time-magnitude bin is a first-order approximation to the actual, model-dependent distribution. 
The actual likelihood function of the ETAS model is known but the predictive next-day likelihood function 
must be simulated and can deviate substantially from the Poisson distribution. However, rather than 
performing computationally intensive simulations, we used the current model to estimate the mean rate 
in each bin. Extensions beyond the Poisson distribution will be left for the future. 

6.6 Estimated Parameter Values 



The parameter estimation program was modified from the one written by 



Helmstetter et al 



2006J : We 



extended the region to the CSEP collection region of California, and we fixed some issues in the code. 



28 



We tested different versions of the ETAS model (various spatial kernels, different learning and target 
magnitude thresholds, different parameter values of the early-aftershock-kernel smoothing procedure, 
etc.). Table [2] presents the results. For brevity, we discuss the performance of the optimization algorithm 
in Supplement 4. 

The exponent a in the productivity law (I15|l measures the relative import a nce of small versu s large 



earthquakes for the triggering budget 



Christovhersen and Smith 



2008 



Helmstetter 



Hainzl et al 



2008 



2003 



Felzer et al 



Felzer et al 



2004]: 



2004 



Helmstetter et al. 



Helmstetter et al 



2005 



20051 1 fo u nd 



that a is close to or equal to one by fitting the number of aftershocks as a function of main shock 
magnitude. Since the number of small earthquakes increases exponentially with decreasing magnit ude, 



their collective ability to trigger earthquakes equals that of the large earthquakes 



Helmstetter 



2003]. As 



a consequence, small, undetected earthquakes have a significant, time-dependent i mpact on the observed 



seismi cit y budget and the fai l ure to model their effect causes parameter bia s 



2005a 



Saichev and Sornette 



. 2005 


2006a 


Werner 


2007; 



Zhuana et al 



Sornette and Werner 



20081. Wc found a = 0.8 ±0.1 



for all models for targets m > 3.95. Because of the strong negative correlation between a and K, it is 
possible that 01 = 1 is within the uncertainties of the estimates. The estimate of a decreases when the 
aftershock kernel is pu rely isotropic and d o es not include the smoot hing of early aftershocks, consistent 



with the simulations by 



Hainzl et 



2008] 



Helmstetter et al 



20061 ] estimated a as 0.43 using essentially 



the same optimization procedure that we used, but for the target magnitude range m > 2. Supplement 
5 lists the estimated param eters for the m > 2 range: Here, our estimates are almost identical to those 



of 



Helmstetter et al 



2006] 



The exponent p in Omori's law was relatively high p ~ 1.27, although typical estimates use smaller 
magnitudes and maximize the likelihood of Omori's law, not of the Poisson forecasts. 

The background rate was also fairly stable across the different models, at fi a ~ 0.08, which was 
about 44% of the average daily number of earthquakes m > 3.95 over the entire period. The estimated 
proportion of triggered events is thus 56%. However, this is a lower bound on the actual proportion 
because events below mo(t) were not counted, and any es t imate of the fraction of triggered events 



depends on the magnitude threshold 



Sornette and Werner 



2005bJ. The background rate of models 



with the Gaussian kernel was slightly larger than of models with the power-law kernel, because the latter 
has much farther reach than the Gaussian kernel. 

The parameter fd is a measure of the size of the aftershock zone. As we explained above, for the 
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Gaussian spatial kernel the length d(m) (|21[) equals the standard deviation of the Gaussian, so that 
fd should be about 1. This parameter can be used as a sanity check of the optimization. We found 
reasonable values for fd between 0.5 and 0.9 in the case of the Gaussian kernel. 

Using the power-law kernel, fd was very small: between 0.08 and 0.25. When we did not smooth 
the locations of early aftershocks to forecast later ones, the value increased to a more reasonable 0.25. 
Nevertheless, these values are difficult to interpret. It suggests using the Gaussian kernel since the fit is 
better understood, despite the marginally better performance of the power-law kernel. 

A lower probability gain per earthquake was obtained, for both spatial kernels, when we did not 
smooth the locations of early aftershocks to forecast the locations of later aftershocks: For example, 
model 1, which used the anisotropic smoothing method for events m > 5.5, scored G = 6.19, while 
model 7, which used the isotropic kernel, obtained G = 6.04. The gains of the smoothing method are 
particularly strong during the early days in an aftershock sequence, as expected. We calculated the 
average daily log-likelihood ratio for the first 10 days after all 18 m > 6 events in the target period 
and found that model 1 outperformed model 7 by an average daily log-likelihood ratio of 0.28. During 
the aftershock sequences, the smoothing method thus works better than the isotropic kernel. However, 
the gains are somewhat diluted because the parameters that use the isotropic kernel tend to give better 
forecasts on the days of the actual m > 6 events. There seems to be a subtle trade-off effect here. 
The benefits of the smoothing technique did not change much when we varied the temporal window 
for smoothing from 1 to 3 days nor when we lowered the magnitude threshold for selecting events from 
m — 6 to m = 5 (see models 5 through 9). 

We tested whether adding the contribution p* (120[) of undetected events < m < mo(t) increases 
the likelihood values: This was not the case over the entire period. However, certain individual sequences 
were much better fit using this correction, e.g. the 1992 Mw 7.3 Landers earthquake sequence (not shown) . 
There seems to be a subtle trade-off effect between modeling sequences and performing well on the days 
of the actual mainshocks. 

6.7 Observed and Predicted Number of Events 

Figure [8] compares the observed (target) earthquakes and the predicted number of events. The left panel 
compares the daily forecasts against the daily observations. Three groups of days can be identified. 
Firstly, there are days on which the forecasts are low (close to the background rate) and few events occur 
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10 ° io 1 10 2 1990 1995 2000 2005 2010 

observed m>3.95 per day 

Figure 8: Left: Comparison of the daily expected number of earthquakes with observed numbers (blue circles). 

The solid red line corresponds to a perfect match. The solid black line represents the background rate 
of the ETAS model. Right: Comparison of cumulative observed number of m > 3.95 events and the 
cumulative daily ETAS forecast. 

that match the forecast. The second group corresponds to low forecasts near the ETAS background rate, 
but to many observed events that are inconsistent with the model. On those days, large earthquakes and 
triggered sequences occurred without foreshocks on previous days that could have helped the forecast. 
But the model's performance was worse than it could be: The CSEP rules for the one-day forecast group 
allow an update of the forecast only once every 24 hours. The full potential of the ETAS model can only 
be realized if updates are allowed after every earthquake. The third group close to the red line (marking 
the perfect match) shows the potential: These are days of active sequences that started the day before, 
the information of which allowed the model to reasonably forecast seismicity. 

The cumulative forecasts (right panel in Figure [8} followed the observed number of events relatively 
well, including the aftershock sequences. But the forecast systematically underestimated individual 
sequences, e.g. the period after the 1992 M\y7-3 Landers earthquake. We expected this underprediction 
as the model cannot update after each event. At the same time, the total cumulative number of events 
was matched well at the end of the entire period. This is because the likelihood penalizes for mismatches 
in the total number of events. But since the days during which large quakes and their aftershocks occur 
cannot be accurately forecast because of the updating rules, the parameters are biased: The background 
rate is overestimated to match the total number of events. These complications are a result of the one-day 
forecasts that are unnecessary for many model families. At the same time, the ETAS model will remain 
a poor predictor of strong events if no foreshocks raise the forecast. 
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How did the ETAS model perform against the time-independent forecast over the course of the 
entire period? Figure [9] shows the daily log-likelihood ratios between the ETAS model and the time- 
independent forecast. The largest ratios occurred on or just after large m > 6 earthquakes, which we 
marked by vertical dashed lines. The daily log-likelihood ratios are also color-coded by the number of 
observed events on each day. The performance of the two models was very similar for days on which no 
earthquakes occur (dark blue filled circles are obscured by overlying markers). The differences increased 
with the number of observed events. For instance, on the day of the 1999 Hector Mine earthquake, many 
earthquakes occurred, and the ETAS model strongly outperformed the Poisson model forecast. Several 
small earthquakes occurred before this large event, thereby locally increasing the ETAS forecast with 
respect to the Poisson model (see also Figure [TT1 discussed below). 

On days when the ETAS rate decayed to its background level, the time-independent forecast had 
a higher rate, so that if an earthquake occurred, the Poisson model beat the ETAS forecast. But the 
likelihood ratio on those days was never very large, since the background rate of the ETAS model is 
not much smaller than the Poisson rate. Therefore, the ETAS model is to some extent guarded against 
surprise events by its background rate, while making vastly better forecasts during active sequences. 

Figure [9] shows that the ETAS model performed significantly better than the Poisson model after 
all large m > 6 earthquakes. Additionally, it performed better on the days of large events that were 
preceded by foreshocks. We discuss a few individual earthquake sequences below. 

6.8 Observed and Modeled Temporal Distribution of Aftershocks 

Figure [TU] compares the observed number of events with ETAS forecasts from 1992 until August 1994, 
along with the daily probability gain per quake (0, which normalizes the exponent of the likelihood 
ratio by the number of observed events per day. Also shown are the background rate of the ETAS model 
(dashed red line) and the rate of the time-independent forecast (dashed blue line) . The figure shows how 
the ETAS forecast tracked the observed seismicity during the aftershock sequences of this particularly 
active period. The largest gains correspond to the most active days. Individual gains were as large as 
G ~ 10 4 per day per earthquake. 

To get an even more detailed picture of the ETAS model's performance in individual aftershock 
sequences, we compared the forecast and the observations just before and after 4 major earthquakes in 
Figure [TT] The figure shows the ETAS model forecast, integrated over space and magnitude bins, along 
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Figure 9: Daily log-likclihood ratios between the ETAS model and the time-independent forecast (filled circles). 

The circles are color-coded by the number of events that occurred on each day. Also shown by vertical 
dashed lines are all m > 6 events during this period. 
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Figure 10: Top panel: Comparison of observed events per day (black dots) with the ETAS forecast (red dots). 

The blue dashed horizontal line represents the time-independent forecast calculated from 1986 until 
2009, while the red dashed line shows the ETAS background rate. Bottom panel: Daily probability 
gains per earthquake (black dots) and reference level of no gain, G = 1, (solid black line). 
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with the time-independent forecast (dashed blue line). Also shown are the actually observed number of 
events on each day, along with the daily probability gains per earthquake on the red axis scale on the 
right. The four earthquakes illustrate two successes and two failures of the ETAS model compared to 
the time-independent forecast. The 1986 MwGA Chalfant earthquake occurred when the ETAS forecast 
was higher than the Poisson forecast, as even on the day before the event the ETAS model outperformed 
the time-independent forecast. On the day of the event, the model underestimated the number of events 
because when the forecast was made, the large main shock had not yet occurred. In the following days, 
the ETAS model forecast the number of events better than the Poisson forecast, as measured by the large 
gains (red circles). The 1992 Miy7.3 Landers earthquake sequence shows similar properties, except that 
the ETAS model rate was slightly below the Poisson forecast on the day of the shock so that the gain was 
slightly below 1. The 1994 MwG.7 Northridge earthquake was also not well forecast by the ETAS model, 
its rate having sunk below the Poisson forecast. Nevertheless, the gains during the aftershock sequence 
are significant. The 1999 Mvi/7.1 Hector Mine earthquake, on the other hand, is a success story for the 
ETAS model. Although its total rate, summed over the region, is smaller than the Poisson forecast (as 
can be seen in Figure [TTjl . a few small quakes locally increased the ETAS rate above the Poisson forecast, 
so that the ETAS model outperformed the Poisson model on the day of the Hector Mine earthquake. 

Figure [11] can also be used to judge the fit of the ETAS model aftershock forecasts with the actual 
observed events. The fluctuations in the number of observed aftershocks are clearly larger than the 
(mean) ETAS model forecast. Even 95% confidence bounds of the Poisson distribution around the mean 
forecast cannot enclose the observations. So while the likelihood ratio and the probability gain registered 
an immense improvement of the ETAS model over the time-independent model, we did not test whether 
the observations are consistent with the ETAS model, as we did for our long-term fo recast in section 



Schorlemrner et al 



2007l |. which assumes 



14.41 If we were to apply daily the current CSEP number test 
a Poisson uncertainty in the number of events, the model would be rejected on many days. However, at 
this point it is clear that the original RELM tests are inappropriate for one-day forecasts. Rather, a first 
step in the right direction wou ld be a model-depend e nt, sim ulated likelihood distribution against which 



the observations are counted 



Werner and Sornette 



20081 ]. Because of the computational complexity, 



however, we need to leave this extension to future work. 
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Chalfant Earthquake M=6.4 Landers Earthquake M=7.3 
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Northridge Earthquake M=6.7 Hector Mine Earthquake M=7.1 
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Figure 11: Retrospective daily forecasts of the ETAS model (blue curve) during four triggered earthquake 
sequences compared with the observed number of events (blue circles) and the time-independent 
forecast (horizontal dashed blue line). On the right (red) ordinate axis, the panels show the daily 
probability gain per earthquake of the ETAS forecast over the time-independent forecast (red circles). 
The red dashed line marks the no-gain line. Vertical black dashed lines mark the day of the four 
large m > 6 earthquakes. 
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Figure 12: ETAS model forecast using a Gaussian spatial kernel and early-aftershock smoothing for the region 
surrounding the 1992 Myy7.3 Landers earthquake epicenter for the day before the main shock (27 
June), of the main shock (28 June), and the two subsequent days (29-30 June). Also shown are 
m > 2 that occurred in the week prior to the day (small black dots) and the target m > 3.95 events 
that occurred on the forecast day (white squares with grey lining). 



6.9 Observed and Modeled Spatio-Temporal Distribution of Aftershocks 

Figure [12] compares the observed events with the spatio-temporal forecast around the 1992 Mw7.3 
Landers earthquake. The panels show the forecast on the day before the main shock (27 June 1992), on 
the day of the main shock (28 June 1992) and on two subsequent days (29-30 June 1992). Also shown 
are all small events m > 2 that occurred in the previous week and contributed significantly to the ETAS 
model forecast, along with the target m > 3.95 earthquakes of each day. We observed large differences 
between the Gaussian and power-law kernels on the days after the main shock: The power-law kernel 



has a larger and smoother spatial extent than the Gaussian kernel (see Figure 7 of 



Helmstetter et al 



20071 ]). The power-law kernel forecasts a high rate in many bins in which no events occur. But on the 
other hand, it better predicted remote targets in bins unaffected by a Gaussian forecast. We preferred 
the Gaussian kernel because the estimates for the parameter fd seemed more reasonable (see section [676]) . 
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7 Time-Dependent Next-Day m > 3.95 Forecasts Based on 
the ETAS Model 

An example of an ETAS model forecast (using model 1 in Table is shown in Figure [T3] along with a 
comparison to the time-independent m > 3.95 forecast described in section [5] The plot on the left shows 
the expected number of earthquakes of magnitudes m > 3.95 in each cell on February 12, 2008. On that 
day, four earthquakes m > 3.95 occurred in roughly the same location in Baja California, Mexico, as 
part of a series of about ten swarm-like earthquakes during a two- week period. Because on previous days 
several earthquakes occurred, the ETAS model rate is locally higher than the time-independent forecast 
by a factor of almost 1, 000. Other differences between the ETAS model and the background model are 
less pronounced and mainly due to small, recent events that increased the ETAS model rate. 

For our time-dependent forecasts, we used the same magnitude distribution as for the time-independent 
forecast described in section |4~T1 including the same corner magnitude m c — 8.0. 



8 Discussion and Conclusions 

We presented two models, modified from I Helmstetter et al 



2006 



20071 ]. for estimating the probabilities 



of future earthquakes in California. The time-independent model uses an optimized adaptive kernel to 
smooth the locations of small m > 2 declustered earthquakes. The model corrects for spatial magnitude 
incompleteness and assumes a tapered Gutenberg-Richter distribution with corner magnitude equal to 
8.0. While we corrected and improved the procedure for estimating the completeness magnitude, we are 
still u nsatisfied with its performance. In t he future, we'd like to use a more robust method, such as the 



one by 



Schorlemmer and Woessner 



2008] . Another area for improvement involves the use of anisotropic 
kernels to smooth seismicity, although our method to smooth the locations of early aftershocks to forecast 
those of later ones seems to perform quite well. Finally, the earthquake catalog we used is relatively short 
(28 years) compared to average recurrence times of very large earthquakes and the crust's memory. In 
the future, we'd like to find a trade-off between high-quality recent data and lower quality older data. 

Our original long-term forecast was cast as a time-independent Poisson process. However, the distri- 
bution of the number of earthquakes in any 
scribed by a negative binomial distribution 



Kaaan 


1973 


Jackson and Kaaan, 


1999 


Schorlemmer et al. 
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ETAS Forecast (12-Feb-08) 



ETAS / Background (1 2-Feb-08) 




Figure 13: Left: ETAS model forecast for 12 February 2008 along with the four m > 3.95 observed events 
on that day (black squares, overlapping). Right: Comparison of the ETAS model forecast and the 
time-independent smoothed seismicity forecast, along with observed events that day (white squares, 
overlapping). Because of several earlier quakes, the ETAS model predicts the occurrence of the four 
observed events on that day much better than the Poisson model. 
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20091 ] or perhaps a heavy-tailed distribution 



Saichev and Sornette 



2006bJ. Retrospective tests showed 



that the Poisson assumption is violated more often than expected even for 5-year target periods. We 
therefore modified our time-independent forecast such that the distribution of the number of forecast 
events was a negative binomial distribution, with mean equal to our Poisson forecast but a variance es- 
timated from past data. The modified long-term forecast passed retrospective number consistency tests. 
We also modified the likelihood consistency test by conditioning the simulated likelihood values on the 
number of observed events. We thereby increased the test's spatial resolution and made it less sensitive 
to the number of observed events. 

While we modified the number distribution of our long-term model, we did not specify the variances in 
each individual space-magnitude bin. This inconsistency can be solved by abandoning time-independent 
forecasts and replacing the Poisson process by a time-dependent process of earthquake clustering. There- 
fore, calculating more realistic 5-year forecasts will involve simulations of (or approximations to the 
distributions of) a branching process. 

Our time-dependent model is such a branching model, but here we solely used it for average next-day 
forecasts. To compete in CSEP's one-day forecast group, we specified next-day forec asts as Poisson 
forecasts that change daily. This is a rough approximation 



Werner and Sornette 



20081, since the rates 



vary significantly within one day and bins are not spatially independent. But the extension to a simulated 
model-dependent number and likelihood distribution for each bin will be left for the future. 

The time-dependent next-day forecasts are based on the ETAS model, the parameters of which 
we optimized on retrospective forecasts using the assumed Poisson distribution. Other features that 
distinguish our implementation from other ETAS flavors include (i) a method to correct for spatial and 
temporal magnitude incompleteness, (ii) smoothing the locations of early aftershocks to forecast later 
ones, and (iii) using small m > 2 earthquakes to forecast larger m > 3.95 events. The ETAS model 
forecasts outperformed the time-independent forecast with a probability gain pe r earthquake of about 6. 



We expect our models to perfor m well, based on 



of the time-independent model by 



Helmstetter et al 



Schorlemmer et al 



2009] 's report on the success 



20071 ] in the RELM experiment. However, with 



increasing time span and the occurrence of a very large event, smoother models based on tectonic, 
geological and geodetic data might work better. Determining the reasons for the timing of such a change 
will help us build better earthquake models. 
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9 Data and Resources 



We used the Advanced National Seismic System (ANSS) earthquake catalog made publicly available by 
the Northern California Earthquake Data Center at www.ncedc.org in the period from 1 January 1981 
until 1 April 2009 wit h magnitude m > 2 and in the spatial r egion defined by the CSEP collection region, 



defined in Table 2 by 



Schorlemmer and Gerstenberaer 



2007( 1. For section 03] we used the ANSS catalog 



in the CSEP testing region from 1 January 1932 until 1 April 2009. 
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Tables 



Table 1: Results of the optimization of the spatial background model: 

Gaussian (gs) ((2J or power-law (pi) kernel JTJ. Input catalog: declustered ANSS catalog. Target catalog: 
ANSS catalog. N: number of earthquakes. L: log-likelihood G: probability gain per earthquake 
over a spatially uniform model J5). n v : optimal number of neighbors in the bandwidth of the smoothing 
kernel. The superscript ■ denotes that n v was constrained, and for *, the target m m i n = 3. 
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Table 2: Parameter estimates of the ETAS model: 

Input catalog: ANSS catalog 1/1/1981 to 4/1/2009 in the CSEP collection region (167,528 earthquakes 
m > 2). Target catalog: ANSS catalog 1/1/1986 to 4/1/2009 in the CSEP testing region (1,521 
earthquakes for m m i„ = 3.95). Spatial background model 21 from Table [T] Reference model: time- 
independent forecast model 21 and the average number /j,ti = 0.18 of daily m > 3.95. *: Using the 
corrective term p*(m) of equation (120 D . §( m=M ) : Early-aftershock-smoothing kernel for quakes above 
m > M [default m = 5.5]. §( T=t ): Early-aftershock-smoothing kernel for aftershocks occurring up to t 
days after a large event [default T = 2 days] . 
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11 E- Supplements 



11.1 Supplement 1: Removal of Explosions from the ANSS Catalog 

The Sandia National Lab published a list of official underground nuclear explosions at nuclearweaponarchive . org/usa/tests/nev 
We matched explosions to entries in the earthquake catalog whenever the events occurred within 12 sec- 
onds and within ±0.5 degree latitude and longitude. Increasing the time criterion up to 18 minutes or 
the space criterion up to ±1.5 does not change the matched events. On the other hand, 19 events are 
matched by a spatial constraint of 0.2 degrees. The two extra matches come from two explosions that 
occur on the southern edge of the Nevada Test Site (see Figure [TJ). These are well-matched in time by 
large and shallow events that are located at the center of the Site in the usual region of underground 
explosions. It is possible that either the locations of these two explosions contain an error (as they are 
exactly co- located in latitude and farther South than all other explosions), or that they triggered shallow 
collapses of the domes of past explosions. Because we did not want to forecast explosions, nor events 
caused by the collapse of domes after the explosions, nor quakes triggered by explosions, we assumed 
a location uncertainty larger than usual. Nine of these events were large m > 5, none had identifiable 
aftershock sequences and all occurred between 1984 and 1992 (see Figure 815[) . We deleted these 21 
events from the earthquake catalog. 




Figure 14: We matched events in the ANSS catalog from 1981 until 2009 with known underground nuclear ex- 
plosions: Matched ANSS events (blue filled circles), matched explosions (red crosses) and unmatched 
ANSS earthquakes (grey points). Also shown are an approximate outline of the Nevada nuclear test 
site (grey) in relation to the CSEP testing (blue line) and collection regions (black line). 
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Figure 15: Identified explosions (large blue filled circles) and earthquakes (small grey dots) in the approximate 
Nevada Test Site area in the ANSS catalog from 1981 until 2009 (see Figure [Tit. 
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11.2 Supplement 2: Optimizing the Spatial Smoothing for Different 
Magnitude Thresholds of the Target Catalog 

To show that our smoothing method performs well for various magnitude thresholds of the target catalog, 
we present further results in Table [3] When we increase the target threshold from m min = 2 to m min = 5, 
the probability gain per quake remains roughly constant. We do observe a drop for target m m i n — 5.5, 
but there are only 2 events that determine this score. At the same time, when we increase the input 
magnitude threshold from m m i n — 2 to m m i„ = 5.5, the probability gain decreases quickly, suggesting 
that small earthquakes can help forecast all seismicity, whether small events or large ones. Different target 
periods obtain different gains, but the fluctuations are much smaller than in the case of rn m i n = 4.95 
(see Table [1}. One explanation for the smaller fluctuations is the increased number of target events, but 
this needs to be investigated further. 
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Table 3: Results of the optimization of the spatial background model: 

Input catalog: Declustered ANSS catalog. Target catalog: ANSS catalog. N: number of earthquakes. 
L: log-likelihood, G: probability gain per earthquake over a spatially uniform model. n v : optimal 
number of neighbors to include in the bandwidth of the smoothing kernel. The superscript • denotes 
that n v was constrained. 
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11.3 Supplement 3: Long-term Forecast for the CSEP Mainshock-Only 
Group 

The R ELM experiment designated two forecast groups: the mainshock-on ly group and the mainshock/aftershock 



group 



Schorlemmer et al 



2007 



Schorlemmer and Gerstenberaen . 



20071 ] . The latter group comprises all 



earthquakes, while the former on ly includes so-called mainshocks selected by the Reasenberg decluster- 
ing algorithm 



Reasenberg 



1985]. Because this declustering method is relatively arbitrary, it is difficult 
to argue that forecasts should be evaluated against these events only. Nevertheless, we submit to the 
CSEP competition an updated 5-year mainshock-only forecast to provide a simple null hypothesis for 
this particular group, too. This supplement describes how we calculated the forecast. 

We first preceded exactly as for the 5-year long term forecast of all earthquakes, but we optimized 
the spatial smoothing on a declustered target catalog. Table [4] summarizes the log- likelihood scores and 
gains over a spatially uniform forecast. Comparing Table [4] to Table [TJ which lists the scores obtained 
on the original catalog, we find that the gains are uniformly a little lower, except for model 11, whose 
gain increased significantly from 2.68 to 4.55. All other models obtained slightly lower gains on the 
declustered target catalog than on the original target catalog, but the conclusion that large earthquakes 
occur on average in the locations of small ones remains valid. As before, our favorite spatial model is 
model 14 fixed n v = 6, since it uses all data and provides a slightly smoother model than one would 
obtain from the mean optimal smoothing parameter over various target periods. 

Having obtained our normalized, spatial model, we used the tapered Gutenberg-Richter distribution 
(|10|) with corner magnitude 8.0 for the entire region. Based on the declustered target catalog, we 
estimated b ~ 0.89 for California and b w 1.73 for the Geysers region (see Figure [16] and section |4~T|) . To 
obtain the number of expected events, we counted the number of events in the declustered catalog and 
divided by its length. We expect a total of N pre d = 20.51 mainshock events in the next five years from 
2010 until 2014. 

We also performed retrospective tests on our mainshock forecast (see Figure [17)1 . In contrast to the 
all-earthquake forecast, this mainshock-only forecast passes the N-test in all (overlapping) target periods 
except the 1984-1989 period, during which the forecast underpredicted. The distribution of the number 
of mainshocks in five-year periods is clearly much more Poissonian than the number of all events. As a 
result, the Poisson forecast fares much better in this forecast category than in the all-earthquake group. 
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Table 4: Results of the optimization of the spatial background model for the mainshock-only 
group: 

Input catalog: declustered ANSS catalog. Target catalog: declustered ANSS catalog. N: number of 
earthquakes. L: log-likelihood, G: probability gain per earthquake over a spatially uniform model. n v : 
optimal number of neighbors to include in the bandwidth of the smoothing kernel. The superscript ■ 
denotes that n v was constrained. 
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Indeed, one would expect one rejection in 20 (non-overlapping) target periods at 5% confidence. The 
forecast passes the L-test in every target period. This should not be surprising, given that the model is 
based on the spatial locations of the target catalogs (in this retrospective test). We did not apply the 
modified tests of section 1431 
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Figure 16: Magnitude distribution of (declustered) earthquakes in the Geysers region (blue) and the remainder 
of California (red squares) from 1 January 1981 to 1 April 2009. 
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Figure 17: Retrospective consistency tests of the 5-year time-independent mainshock-only forecast on past, 
overlapping, declustered 5-year periods with starting year as abscissa. Left (blue) ordinate axis: 
The N-test statistic <5 (blue squares), measuring the agreement between the number of expected and 
observed events, and the L-test statistic 7 (blue circles), measuring the agreement between the ex- 
pected and observed likelihood score. Right (green) ordinate axis: Number of observed events (green 
open circles), the expected number of events (green solid line), along with 95% Poisson confidence 
bounds (green dashed lines). Red outlines of the symbols mark periods during which the forecast is 
inconsistent with the observations. Grey bars denote rejection regions. 
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11.4 Supplement 4: Simplex Method for ETAS Parameter Estimation 

In this supplement, we discuss in more detail the performance of the ETAS model parameter estimation. 
The model's growing importance means that we need to thoroughly understand the likelihood estimation 
and associated uncertainties. The performance of the optimization is rarely discussed in the articles 
involving the ETAS model, although the parameter estimation is not simple. 

Figures [TS] shows the log-likelihood scores as a function of parameter values visited during the simplex 
optimization. While the actual 5-dimensional likelihood function is difficult to visualize when projected 
onto each dimension individually, the correlations between the parameters are evident. The convergence 
of the parameters as a function of the iteration is shown in Figure 1191 In Figure 1201 we projected the 
optimization steps of the simplex method onto the 2-dimensional log-likelihood surface as a function of 
a. and K. 
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Figure 18: Illustration of the simplex optimization method using model 1 in Table [2] with a Gaussian kernel: 
Value of the log likelihood as a function of each model parameter (a) - (e) and as a function of the 
number of iterations (f). Log likelihood profiles of the other models in Table [2] are more Gaussian 
and less correlated. 



Despite this evidence for a non-trivial likelihood surface near its maximum, the parameters of most 
models converged after about 100 iterations for all models for target events m > 3.95. This is in contrast 
to optimization for target events m > 2, which sometimes failed to converge after 200 iterations. For the 
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latter target set, we also encountered cases where different random number generator seeds or parameter 
starting values led to significantly different parameter estimates (not shown). 

In Figures 1181 1191 and 1201 we show the worst case scenario (models 1 and 9gs in Table [2j of the 
optimization procedures to illustrate the difficulty of the non-linear, non-Gaussian estimation problem 
of the ETAS model parameters. Most of the other models could be estimated with a more well-behaved 
likelihood function. We are relatively confident that the simplex method performed adequately for the 
purposes of the CSEP target set of m > 3.95. Most likelihood profiles were better resolved around 
its maximum than in Figure 1181 In the future, we'd like to use a method that allows us to quantify 
uncertainties in the parameter values. 
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Figure 19: Evolution of the parameter estimates with the number of iterations for model 1 in Table [2] 
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Figure 20: Log-likelihood values (as a percentage of the final value) as a function of the ETAS model parameters 
(a,K) for all points visited by the simplex method during the optimization of model 1. The large 
blue star marks the initial values; the large black star the final estimates. 
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11.5 Supplement 5: ETAS Parameter Optimization for Target Events 

m > 2 



Helmstetter et al. 



2006], wo used the same 



To compare our parameter estimates to those obtained by 
target m agnitude threshold m mi n — 2. Table [5] presents the estimates, which are similar to those 



found by 



Helmstetter et al 



200 



However, the probability gains have universally decreased from the 



previous values above 1 to values closer to 5. There may be several reasons for the smaller gain. Firstly, 



Helmstetter et al 



20061 ] studied only southern California, which includes strongly clustered 1992 Landers 
to 1999 Hector Mine sequences. Expanding the region to all of California dilutes those gains, as more 
independent earthquakes are included. Of course, strong triggering sequences are also observed outside 
of southern California, but the statistics seem to be domi nated by the strong act ivity in the Mojave 



Helmstetter et al 



20061 ] used a finer grid of 



Desert during the 90s. Secondly, our grid size is 0.1° while 
0.05°. A subtle, nonlinear trade-off probably exists between the realized gain of a time-dependent model 
and the effects of the spatial (and of course temporal) grid-size, the study region and the occurrence 
of large quakes and their triggered events. The sole objective measure of gain would be achieved on a 
global scale in continuous time. 



Table 5: Parameter estimates of the ETAS model for target threshold m„ 



2: 



Input catalog: ANSS catalog 1/1/1981 to 1/4/2009 in the CSEP collection region (167,528 earthquakes 
m > 2). Target catalog: ANSS catalog 1/1/1986 to 1/4/2009 in the CSEP testing region (123,475 
events m > 2). Spatial background model 21 from Table [TJ Reference model: time-independent 
forecast model 21 and the average number [i^l = 14.54 of daily m > 2 earthquakes. $: Using the 
corrective term p*(m). §( m = M ) : Using the early-aftershock-smoothing kernel for earthquakes above 
m > M. 
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